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(54) Information handling system including effective address translation for one or more 
auxiliary processors 

(57) An information handling system which efficient- 
ly processes auxiliary functions such as graphics 
processing includes one or more processors, a high 
speed processor bus connecting the one or more proc- 
essors, a memory controller for controlling memory and 
for controlling the auxiliary function processing, a mem- 
ory system, an I/O bus having one or more I/O control- 
lers with I/O devices connected thereto, where the mem- 
ory controller includes a command buffer for storing a 
command block, a translation lookaside buffer (TLB), ta- 
ble wal logic, and a -page table buffer 
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Description 

TECHNICAL FIELD 

The present invention relates to informaticMri han- 
dling systems and, nriore particularly, to information han- 
dling systems including efficient means for address 
translation between virtual and real addresses for use 
by an auxiliary function processor. 

BACKGROUND OF THE INVENTION 

Most modem computer systems today use a con- 
cept of virtual memory wherein there is more memory 
available to the application programs than really exists 
in the machine (so-called real memory). This memory is 
called virtual because the operating system and hard- 
ware let the application think this mennory Is there, but 
in reality nnay not exist in physical memory accessible 
by the processor(s) but is allocated out on the system 
hard disk. The hardware and software translate virtual 
addresses by the program into addresses where the 
memory really is, either in real physical memory or 
somewhere out on the hard disk. It does this on a so- 
called page unit basis which is typically 4K bytes. 

These translations are kept in the processor hard- 
ware in a translation lookaside buffer (TLB) because 
they are done constantly and need to be done rapidly. 
When a page is accessed by a processor and it is not 
in real memory, a page fault interrupt occurs and the 
software brings in the page from disk and maps it to a 
real page in memory. If there was no empty real memory 
space to put that page in from the disk, the software first 
selects a page to be copied to the disk freeing up space 
before replacing It with the page from the disk. This is 
called page swapping. In order to remove a real page 
from memory, the software changes the hardware trans- 
lation buffers (TLBs) so that the old virtual addresses no 
longer map to their old real page location. This is called 
invalidating the TLB. If that virtual address is then refer- 
enced, the software will take a page fault and then know 
it is not in real memory and to look for it on the hard disk. 
When the new page is brought in from the disk, the TLB 
is then changed to map the new virtual address to that 
real page address in memory. 

Today's computer systems also consist of one or 
more processors; each having a cache memory which 
contains a copy of recently used data from real memory 
to speed up execution. When a processor fetches or 
stores data to memory, the data is loaded or saved in its 
cache. A similar technique is used to save data back to 
memory when not recently used and to update a section 
of the cache with data currently being accessed by the 
processor(s). This is usually done entirely in hardware 
for speed. 

When a processor is accessing cached data, it 
causes no external bus or memory activity and. there- 
.fore, is extremely efficient. 



In these types of computer systems, several alter- 
natives currently exist for moving data between memory 
(or a processor cache when data may be modified in a 
processor cache) and an I/O device. The first alternative 

5 is to have the processor issue loads and then stores di- 
rectly to the devices using PIO (programmed t/0). The 
processor accesses memory (or cache) using a Load 
instruction into one of its internal registers. The hard- 
ware translates the virtual address using the TLB and 

^0 gets the data from the real memory (cache) location. As 
noted above, a page fault wilt occur if the data is not 
presently in real memory, and the OS software will swap 
the data in and then the access will occur Once the data 
is in the processor register, it is then written to the I/O 

15 device using a store to the I/O location. (The reverse 
procedure is used if the I/O device is the source of the 
data and the memory is the target). 

This method, although simple in programming 
terms, has the drawback of consuming any processor 

20 cycles since the processor is slowed by the speed of the 
I/O device, as well as consuming system bus and t/O 
bus bandwidth since there are no burst transfers avail- 
able, and the transfers are limited to the processor op- 
erand sizes (words, double words, etc.). Transferring a 

25 4K page of data in this manner would require a thousand 
such operations using the typical word size operand 
load and stores. 

Another common alternative is to use Direct Mem- 
ory Access (DMA) to transfer flocks of data from mem- 

30 ory to I/O or vice versa. This has the advantage over the 
first alternative of saving many CPU cycles, using more 
efficient burst transfers and potentially not using the sys- 
tem bus bandwidth, if due to the system organization, 
the traffic can be kept off of the main system (processor/ 

35 memory bus); however, there is still a large processor 
overhead involving the DMA setup, as will be explained 
below, and in handling the terminating interrupt, which 
again involves the OS kernel. 

The DMA setup is complicated by the fact that when 

40 an application wishes to write or read some data from 1/ 
O from one of its virtual pages, the I/O DMA devices do 
not typically understand these virtual addresses and, 
second, where is the data, in memory or on the hard 
disk? As noted before, the OS software may have tem- 

45 porahly swapped an application's data page out to disk. 
To set up a DMA transfer requires the processor to 
get the source (or target) memory address, translated 
from a virtual address to a real memory address, and 
then get the OS software to "pin" the real page in mem- 

50 ory while the transfer is taking place. Both of these op- 
erations involve an OS kernel call which can be expen- 
sive in processor cycles. The "pinning" operation is for 
the real page manager to mark the real page unavailable 
to be paged out to disk and not be replaced by the OS 

55 software. If this were allowed, the I/O device could trans- 
fer data to an application other than the one requesting 
the transfer, with disastrous results. 

For data intensive transfers such as graphics 
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screen painting or multimedia device transfers, the CPU 
overhead or system bus bandwidth is the limiting factor. 

' In the Pow/erPC" (PowerPC is a registered trade- 
mark of International Business f\/tachines Corporation) 
processors, there are two external device control in- s 
slructions which act like a toad or store to memory, in 
that the processor translates a virtual address to a real 
physical address and places it on the address bus of the 
processor, and then either loads a register with a word 
of data from its (system) data bus or stores a word of 
data from a GPR to the processor (system) data bus. In 
addition, however, these instructbns source a resource 
identification parameter (RID) along with these opera- 
tions using additional pins. The PowerPC architecture 
provides up to a five-bit field for this purpose which could ^5 
allow up to 32 resources in a system. This RID can be 
used as an address to select a resource which uses the 
physical address on the address bus and that data on 
the data bus for its own unique purposes. These instruc- 
tions are called external control out word (ecowx) for the 20 
store type instruction and external control in word (eci- 
wx) for the load type instruction. 

Auxiliary processing functions such as graphics 
processing have generally been performed by relatively 
expensive graphics adapters for three dimensional pixel 25 
processing in prior art systems. 

Recent prior art software developments have al- . 
lowed graphics processing to be handled by the main 
processor complex in an information handling system. 
However, the result was not totally satisfactory since 30 
graphics pipeline processing does not adapt well to a 
normal CPU architecture. 

An inherent problem in such systems is that as soft- 
ware in the main processor complex builds command 
blocks for the auxiliary function processing such as 35 
graphics processing, it references effective addresses 
(the processor's address translation mechanism is not 
used). When the command block is received by the 
graphics processor, it contains references to effective, 
rather than real, addresses. Since the auxiliary function 40 
processor is located in a controller such as the memory 
controller connected between the processor bus and the 
memory system, means are required to translate the ef- 
fective addresses to real addresses so that the memory 
controller, for example, can access the data referenced 
in the command blocks. 

SUMMARY OF THE INVENTION 

Therefore, it is an object of the present invention to so 
execute graphics processing functions in a graphics 
processor kxated in a controller attached to the proc- 
essor bus including means for translating virtual (or ef- 
fective) addresses contained in command blocks to real 
addresses so that access is made to real addresses in 55 
memory. 

Accordingly, an information handling system which 
efficiently processes auxiliary functions such as graph- 



ics processing includes one or more processors, a high 
speed processor bus connecting the one or more proc- 
essors, a memory controller for controlling memory and 
for controlling the auxiliary function processing, a mem- 
ory system, an I/O bus having one or more I/O control- 
lers with I/O devices connected thereto, where the mem- 
ory controller includes, a translation lookaside buffer 
(TLB) for storing recently used address translations, a 
comparator for comparing addresses associated with a 
command block to addresses stored in said translation 
lookaside buffer; and logic means for translating an ef- 
fective address associated with said command block to 
a real address for use by said one or more auxiliary func- 
tion processors. 

It is an advantage of the present 'invention that 3D 
graphics processing may be efficiently accomplished by 
anauxiliary function processor in a controller connected 
to the processor bus, without the overhead of the proc- 
essor translating the addresses in the command blocks 
to rieat addresses, and further calling OS routines to lock 
dbwri these addresses in real memory so that they can- 
not be swapped out while they are being processed by 
the auxiliary function processor. 

Other features and advantages of the present in- 
vention will become apparent in the following detailed 
description of the preferred embodiment of the invention 
taken in conjunction with the accompanying drawing. 

BRIEF DESCRIPTION OF THE DRAWING 

Figure 1 is a block diagram of an information han- 
dling system in accordance with the present invention. 

Figure 2 is a block diagram showing in greater detail 
a memory controller including an auxiliary function proc- 
essor and associated logic in accordance with the 
present invention. 

DETAILED DESCRIPTION OF A PREFERRED 
EMBODIMENT OF THE INVENTION 

Referring now to Figure 1 , an information handling 
system 100 embodying the present invention will be de- 
scribed. Multiprocessor system 100 includes a number 
of processing units 1 02, 1 04, 1 06 operatively connected 
to a system bus 1 06. Also connected to the system bus 
108 is a memory controller 110, which controls access 
to system memory 11 2, and I/O channel controllers 114, 
116, and 118. Additionally, a high performance I/O de- 
vice 1 20 may be connected to the system bus 1 08. Each 
of the system elements described 102-120, inclusive, 
operate under the control of system controller 1 30 which 
communicates with each unit connected to the system 
bus 1 08 by point to point lines such as 1 32 to processor 
102, 134 to processor 104, 136 to processor 106, 140 
to controller 110, 144 to I/O channel controller 114, 146 
to I/O channel controller 116, 148 to I/O channel con- 
troller 118, and 150 to high performance I/O device 1 20. 
Requests and grants of bus access are all controlled by 



so 



55 



3 



CD n7aC1T-TAi 1 ^ 



5 EP 0 766 177 A1 6 



system controller 130. 

I/O channel controller 1 1 4 controls and is connected 
to system I/O subsystem and native i/O subsystem 160. 

Each processor unit 102, 104. 106 may include a 
processor and a cache storage device. 

One of the bus devices, such as processor 102, 
may request to enable an operation onto bus 108 from 
system controller 1 30 via connection 1 32. Upon receiv- 
ing a bus grant from system controller 130, processor 
102 will then enable an address onto bus 108. 

Referring now to Figure 2, elements of the present 
invention will be described in greater detail. 

A memory controller 204 is also connected to proc- 
essor bus 14, 16 for controlling access to memory sys- 
tem 24 either by processors 12 or by requests from I/O 
controllers 32. Memory controller 204 includes an aux- 
iliary function processor 206 which may be a graphics 
processor. Memory controller 204 also includes a com- 
mand buffer 210 for storing commands for each com- 
mand block a translation lookaside buffer 220 as part 
of the address translation mechanism, compare circuit 
222 for determining whether an effective address pre- 
sented by the graphics processor 206 matches an entry 
in TLB 220 table walk logic 224 and page table buffer 
226. 

Address translation, and more particularly address 
translation in a memory management unit including 
TLBs, table walk logic and page table buffers, is de- 
scribed in detail in "PowerPC 601 RISC Microprocessor 
User's Manual' Revision 1 . published by Motorola. Inc.. 
1993, in Chapter 6 "Memory Management Unit", which 
is hereby incorporated by reference herein. 

The operation of an address translation mechanism 
for translating virtual addresses to real addresses for 
use in graphics processing in accordance with the 
present invention will be described. 

Address translation logic is included in memory 
controller 204 to translate effective addresses in com- 
mand blocks received by memory controller 204 from 
processor 12 for graphics processing in graphics proc- 
essor 206. The address translation logic is required, 
since the graphics processor needs to access a real ad- 
dress in memory where the information to be used in the 
graphics processing is stored or to be stored. 

The address translation logic includes a four-way 
translation lookaside buffer (TLB) 220. If there is a miss 
indicated by a no compare from compare circuit 222 be- 
tween addresses In TLB 220 and the effective address 
presented by graphics processor 206, memory control- 
ler 204 performs a table walk of page tables in memory 
system 24. The page table walk operation is described 
in the above -referenced PowerPC User's Manual. Mem- 
ory controller 204 stores current copies of system archi- 
tected facilities used to walk the system page tables 
when doing address translation. Memory controller 204 
fetches cache lines from the system page table on an 
as-needed basis to find an effective to real address 
translation. If no entry is found In the system page table 



226. a page fault condition is sent to the graphics engine 
206. 

Since memory controller 204 is not a processor, it 
cannot deal directly with a page fault, tl a page fault con- 

5 dition occurs while graphics processor 206 is process- 
ing a command block from the main processor complex 
12, an error status packet reflecting the page fault con- 
dition is passed to the processor 1 2 by memory control- 
ler 204. Processor software recognizes the fault condi- 

10 tlon and causes the faulting address to be rerun. When 
processor 1 2 encounters the same page fault condition, 
system software (outside the scope of the present in- 
vention) resolves the condition and passes control back 
to the graphics processor 206. An additional mechanism 

15 is added to the address translation logic to handle syn- 
chronization of reference (R) and change (C) bits in sys- 
tem page table 226. Normally, the R and C bits are up- 
dated by processor 12 and used by the software kernel 
to implement page casting algorithms. The graphics 

20 processor 206, however, does not update the R and C 
bits. This raises the possibility that a situation may arise 
where a page may be updated by graphics processor 
206, but that same page has been Invalidated by the 
software kernel resetting the C bit. To avoid such situa- 

25 tions, the address translation logic checks the C bit in 
entries of page table buffer 226 to verify that the C bit is 
set for the matching entry In translation table? If the C 
bit is not set, the page fault mechanism described above 
is triggered. 

30 Since memory controller 204 contains TLB entries, 
memory controller must process PowerPC TLB opera- 
tions. A particular TLB command is Initiated when the 
software kernel is about to invalidate a page. When 
memory controller 204 detects the TLB command on the 

35 processor bus 14, 16, the translation logic flushes the 
TLB 220, and any transaction utilizing a previously 
translated address is forced to complete any transaction 
utilizing that address before the TLB invalidate com- 
mand can complete. 

40 Although the invention has been described with re- 
spect to a preferred embodiment which specifically 
deals with graphics processing where the graphics 
processing engine Is embodied as a portion of a memory 
controller, it will be understood by those skilled in the art 

45 that the invention equally applies to other auxiliary func- 
tion processors which require access to processors 
across the processor bus and to memory in a manner 
so as to minimize interference with other processor and 
memory accesses. 

so Accordingly, the scope of this invention is limited on- 
ly by the following claims and their equivalents. 

ClaimG 

55 

1. An information handling system, comprising: 
one or more processors; 
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a processor bus connecting the one or more 
processors; 

a rnemory controller for controlling memory and 
for controlling one or more auxiliary function 
processors; 

a memory system; and 

an I/O bus having one or more I/O controllers 
with I/O devices connected thereto; 

said memory controller comprising: 

a translation lookaside buffer for storing recent- 
ly used address translations; 

a comparator for comparing addresses associ- 
ated with a command block to addresses stored 
in said translation lookaside buffer; and 

logic means for translating an effective address 
associated with said command block to a real 
address for use by said one or more auxiliary 
function processors. 

2. An infomnation handling system, according to claim 
1. further comprising: 

means for signalling a page fault condition to 
a processor. 

3. An information handling system, according to claim 
1, further comprising: 

means for testing one or more predetennined 
bit positions in an entry in a page table to determine 
if a page in memory is being changed by other ele- 
ments of said information handling system. 

4. An information handling system, according to claim 

3, further comprising: 

means for triggering a page fault if said one 
or more predetermined bit positions in said entry are 
not active. 

5. An information handling system, according to claim 

4. further comprising: 

means responsive to a predetermined proc- 
essor command for inhibiting execution of said 
command in said memory controller until a current 
transaction is completed. 

6. A memory controller in an information handling sys- 
tem, comprising: 

a translation lookaside buffer for storing recent- 
ly used address translations; 

a comparator for comparing addresses associ- 



ated with a command block to addresses stored 
in said translation lookaside buffer; and 

logic means for translating an effective address 
5 associated with said command block to a real 

address for use by said one or more auxiliary 
function processors. 

7. A memory controller according to claim 6. further 
10 comprising: 

means for signalling a page fault condition to 
a processor. 

8. A memory controller according to claim 6. further 
comprising: 

means tor testing one or more predetermined 
• bit positions in an entry In a page table to determine 
if a page in memory is being changed by other ele- 
• " ments of said information handling system. 

20 

9. A memory controller according to claim 8. further 
comprising: 

means for triggering a page fault if said one 
or more predetermined bit positions in said entry are 
2S not active. 

10. A memory controller according to claim 9. further 
comprising: 

means responsive to a predetermined proc- 
30 essor command for inhibiting execution of said 
command in said memory controller until a current 
transaction is completed. 
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